Predict development pressure: how do we define “a lot of development”?
Define affordability burden: how do we define “affordability burden”? – % change year over year in population that is experience rate burden (will probably see extreme tipping points), growing population, % change in area incomes
Identify problem zoning
Calculate number of connected parcels
Predict development pressure at the block level
Identify not burdened areas
Identify problem zoning
Calcualte number of connected parcels
Advocate for upzoning in parcels where there is local development pressure, no affordability burden, problem zoning, and high number of connected parcels
To begin, we run a simple regression incorporating three engineered groups of features: space lag, time lag, and distance to 2022. We include this last variable because of a Philadelphia tax abatement policy that led to a significant increase in residential development in the years immediately before 2022. We will use this as a baseline model to compare to our more complex models.
Show the code
permits_train <-filter(permits_bg %>%select(-mapname), year <2022)permits_test <-filter(permits_bg %>%select(-mapname), year ==2022)permits_validate <-filter(permits_bg %>%select(-mapname), year ==2023)reg <-lm(permits_count ~ ., data =st_drop_geometry(permits_train))predictions <-predict(reg, permits_test)predictions <-cbind(permits_test, predictions)predictions <- predictions %>%mutate(abs_error =abs(permits_count - predictions),pct_error = abs_error / permits_count)ggplot(predictions, aes(x = permits_count, y = predictions)) +geom_point() +labs(title ="Predicted vs. Actual Permits",subtitle ="2022") +geom_smooth(method ="lm", se =FALSE)
We find that our OLS model has an MAE of only MAE: 2.25–not bad for such a simple model! Still, it struggles most in the areas where we most need it to succeed, so we will try to introduce better variables and apply a more complex model to improve our predictions.
4.3 Random Forest Regression
Show the code
rf <-randomForest(permits_count ~ ., data =st_drop_geometry(permits_train),importance =TRUE, na.action = na.omit)rf_predictions <-predict(rf, permits_test)rf_predictions <-cbind(permits_test, rf_predictions)rf_predictions <- rf_predictions %>%mutate(abs_error =abs(permits_count - rf_predictions),pct_error = abs_error / (permits_count +0.0001))ggplot(rf_predictions, aes(x = permits_count, y = rf_predictions)) +geom_point() +labs(title ="Predicted vs. Actual Permits",subtitle ="2022") +geom_smooth(method ="lm", se =FALSE)
Furthermore, we can identify properties with high potential for assemblage, which suggests the ability to accomodate high-density, multi-unit housing.
Show the code
nbs <- filtered_zoning %>%mutate(nb =st_contiguity(geometry))# Create edge list while handling cases with no neighborsedge_list <- tibble::tibble(id =1:length(nbs$nb), nbs = nbs$nb) %>% tidyr::unnest(nbs) %>%filter(nbs !=0)# Create a graph with a node for each row in filtered_zoningg <-make_empty_graph(n =nrow(filtered_zoning))V(g)$name <-as.character(1:nrow(filtered_zoning))# Add edges if they existif (nrow(edge_list) >0) { edges <-as.matrix(edge_list) g <-add_edges(g, c(t(edges)))}# Calculate the number of contiguous neighbors, handling nodes without neighborsn_contiguous <-sapply(V(g)$name, function(node) {if (node %in% edges) {length(neighborhood(g, order =1, nodes =as.numeric(node))[[1]]) } else {1# Nodes without neighbors count as 1 (themselves) }})filtered_zoning <- filtered_zoning %>%mutate(n_contig = n_contiguous)filtered_zoning %>%st_drop_geometry() %>%select(rf_predictions, n_contig, OBJECTID, CODE) %>%filter(rf_predictions >10, n_contig >2) %>%arrange(desc(rf_predictions)) %>%kablerize(caption ="Poorly-Zoned Properties with High Development Risk")
Poorly-Zoned Properties with High Development Risk